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ABSTRACT 

We present the ability of our Spatial Audio Toolkit for Immersive En¬ 
vironment (SATIE) to render simultaneously real-time audio scenes 
composed of various spatialization methods. While object oriented 
audio and Ambisonics are already included in SATIE, we present a 
prototype of a directional reverberation method based on Impulse 
Response computation and describe how this method will be in¬ 
cluded in SATIE. 

1. INTRODUCTION 

A growing number of computer music performance venues are now 
equipped with large loudspeaker configurations [1], and therefore 
provide new opportunities for artists using 3D audio scene environ¬ 
ments for composition and sound design. This, along with the recent 
rise of affordable spatial audio recording devices and increased inter¬ 
est in virtual reality experiences, gives rise to a growing need of com¬ 
bining multiple spatialization methods: captures (live or not) made 
in different ambisonic formats, mono object-based audio sources as 
well as flexible & adaptable speaker configurations. We anticipate 
the evolution of spatial audio composition — targeting perform¬ 
ing arts, installations or any other immersive experiences — involv¬ 
ing different types of audio sources such as live audio capture, field 
recordings and synthetic audio, and where visual[2] and haptic[3] 
correlates with the audio part. 

Moreover, innovation from the game industry is pushing forward 
virtual and augmented realities, approaching spatial audio with an 
object oriented manner: sources are sound objects , located in space 
and controlled with low level parameters such as gain, equalizer and 
spread. This approach, although effective for speaker array systems, 
is missing architectural acoustical responses and adapts poorly to 
non clearly located sound sources such as the sound of a river. The 
3D graphic world is now entering audio and provides methods for 
the simulation of sound based on physics of soft body vibration and 
sound propagation [4], Although such simulations are probably hard 
to achieve in real-time, simulations of acoustic responses of 3D envi¬ 
ronment may improve significantly the coherence of the integration 
of audio sources with the virtual space, while still allowing a real¬ 
time & 6-DoF navigation [5]. The use of ray tracing algorithms for 
real-time rendering is appropriate [6] and has the advantage of in¬ 
cluding the direction of the sound during auralization [7], allowing 
real-time calculation of directional sound reflections. 

One of the main challenges today for spatial audio render is to 
support the multiplicity of the i) audio display methods, ii) spatial 
audio algorithms and iii) spatial audio authoring and 6-DoF naviga¬ 
tion in spatial audio [8], To date however, many existing real-time 
3D audio scene rendering systems, such as COSM [9], Blender- 
CAVE [10], Spatium [Ilf, Zirkonium [12], CLAM [13], 3Dj [14], 
Panoramix [15] and the spatDiff library [16] mostly focus on trajec¬ 
tory based composition with object oriented audio and sound fields 



Figure 1: Example of an augmented reality application where a com¬ 
bination of several spatialization algorithms (ambisonics and object 
oriented audio): a 360° audiovisual capture is rendered simultane¬ 
ously with synthetic objects, the bubbles coming out from the white 
vase. 


with ambisonics. The challenge of navigating in heterogeneous spa¬ 
tial audio content is illustrated with Figure 1. where the spatial audio 
scene is constituted from 360° audio/video footage where the sound 
field captured using an ambisonic microphone 1 is mixed with syn¬ 
thetic audio is spatialized through an object oriented approach and 
correlated with 3D objects on screen (the white bubbles coming out 
from the white vase). 

In this paper, we present how our Spatial audio Toolkit for Im¬ 
mersive Environments (SATIE 2 ) addresses the challenge of several 
approaches to audio scene rendering, possibly combining simultane¬ 
ously object based audio, ambisonic formats and architectural based 
acoustical spatialization. 

2. SATIE 

The development of SATIE (with the Supercollider language [17]) 
was first motivated by the need to render dense and rich audio scenes 
the Satosphere, a large dome-shaped audiovisual projection space at 
the Society for Art and Technology [SAT] in Montreal, and to com¬ 
pose real-time audio/music scenes consisting of hundreds of simul¬ 
taneous sources targeting loudspeaker configurations of 32 channels 
or more, and sometimes with two or more different audio display 
systems [18], In fact, SATIE easily adapts to different audio display 
configurations and supports plugins architecture which makes it eas¬ 
ily extensible to new situations. As such, it fills the role of a rapid 

1 The ZyliaZM-1 microphone. 

2 https ://gitlab. com/sat-metalab/satie, accessed Dec. 
2018 
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Figure 2: Example pipeline of a spatial rendering involving hetero¬ 
geneous audio sources: reverberating sources, sound field sources 
and sound object. 


prototyping tool for spatial audio composition. 

Control of sound sources in SATIE is done through unified OSC [19] 
messages allowing for life management of each sound sources, along 
with (possibly custom) parameters control. 

3. RENDERING METHODS 

Facing a variety of approaches to composition with dense audio struc¬ 
tures and a variety of audio displays, SATIE implements a flexible 
rendering pipeline allowing mixing of different audio input formats 
and multichannel mastering and is easily adaptable to various audio 
displays. We rely mainly on Supercollider’s supernova rendering 
engine for multi-threading operation. Consequently, we have access 
to parallel groups[20] which solve some real-time related issues with 
synth instantiating and bus allocation. SATIE structures different 
types of audio processors in layers, represented by a hierarchy of 
parallel groups (ParGroups): 

• audio sources 

• effects 

• post-processors. 

Audio sources are different types of mono or multichannel audio 
generators and players. On the second level are effects which usu¬ 
ally do not generate sound but modify the signal of audio sources. 
Finally, post-processors are meant as mastering stage, where the fi¬ 
nal stages of DSP are done. In the actual implementation, the post¬ 
processors are divided in two groups: one for b-format signals and 
one for traditional mono/multichannel signals. 

The signals between audio sources and effects pass through busses, 
i.e. the user allocates auxiliary busses and manages the bus access 
on both, the generator and effect side. If any post-processors are 
present, all signals are collected there, otherwise, they bypass di¬ 
rectly to the spatializer. Multiple spatializers can be used, in which 
case SATIE will create appropriate number of output channels. 

Figure 2 shows a rendering pipeline that combines object based 
audio sources, sound field sources and reverberating sources into het¬ 
erogeneous mix. 



3.1. Object Based Audio 

Object audio (Figure 3) is what is most commonly used in various 
entertainment industries where a sound source has a clearly defined 
position within the coordinate system [21 ]. SATIE supports different 
types of object based audio sources, such as mono audio, mono live 
input sources and synthesized sounds [22]. The spatializers handling 
object audio expect azimuth, elevation and gain for panning each 
audio object. 

SATIE was initially designed to render large numbers of mono 
audio sources, optionally with effects, to large multi-channel loud¬ 
speaker systems. Audio sources and effects can be placed in groups 
and controlled either per group or on individual basis. Similarly, 
spatializers take mono signals and place them on different chan¬ 
nels according to azimuth, elevation and gain parameters. The post¬ 
processing audio object is comparable to mastering effects in a stu¬ 
dio or live pipeline, typically limiting, compressing or normalizing 
signals. 

While all parameters (audio object specific as well as spatial- 
ization) can be modified either directly from the Supercollider lan¬ 
guage, SATIE supports OSC and our preferred method is using a 3D 
engine for “volumetric” control of the sources as well as actual ge¬ 
ometry computation. In line with this object based approach and load 
balancing physical computation we were able to use particle swarms 
of hundreds simultaneous sound sources. 

3.2. Ambisonics 

Ambisonic pipeline, implemented via SC-HOA plugins/quark 3 (Fig¬ 
ure 4(a)) provides means to play multichannel files, live audio inputs, 
encode mono signals into b-format signals and transcode between 
different ambisonics formats (ACN and FuMa). It supports b-format 
up to order 5. 

SATIE supports ambisonics with the same approach to signal 
path. The ambisonic audio input can be sent to ambisonic effects and 
post-processors such as rotation, mirroring, and beamforming filter¬ 
ing. The significant cost of ambisonic decoding is payed only once 
since not embedded in each ambisonic source pipeline, but rather at 


3 https://github.com/florian-grond/SC-HOA 
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(a) Pipeline for ambisonic sources. 


(b) Pipeline for convolution based spatialization. 


Figure 4: SATIE pipeline involving ambisonics 


the post-processor stage. We can also transcode between different 
ambisonic orders. 

3.3. Reverberating Sources with Convolution Reverb 

Having various audio rendering methods driven by 3D engines opens 
doors to the desire of simulating acoustic spaces. Consequently, we 
have started developing a tool for real-time generation of impulse re¬ 
sponses through ray tracing with the idea of integrating the IR work- 
flow with SATIE. Figure 5(a) shows a screenshot of a real-time ren¬ 
dered frame where the listener is facing a sound source represented 
by a cube at the end of the hallway. Figure 5(b) shows a wireframe 
view of a simple model (not related to the picture on the left) show¬ 
ing what is actually going on. The black dots on the inner faces of 
the model represent the impact points of the rays on the walls of a 3D 
model. Sound sources and the listener are not shown, it simply shows 
a point cloud mapped on the model for reference. This implementa¬ 
tion uses another custom software, VARAYS 4 , which shares the 3D 
model with the 3D engine (in this case we’re using ElS), receives the 
coordinates of the sound sources and the listener and writes IR files 
to disk. The IR Hies are read by SATIE which continuously replaces 
the buffer read by Supercollider's PartConv UGen. A crude proto¬ 
type of this process (using mono convolution) is demonstrated in the 
following video https : //vimeo. com/30 62 02 4 41. 

Besides mono IR, we can also generate Ambisonic IR (AIR), 
although at the time of the writing, this process has not yet been 
integrated into SATIE. 

4. CONCLUSION 

This paper outlined some of our approaches to heterogeneous audio 
scenes consisting of different types of audio input sources and multi¬ 
channel displays. We described some SATIE functionalities with re¬ 
gard to heterogeneous spatial audio scenes. We have also described 

4 https://gitlab.com/sat-metalab/varays 


our approach to Ambisonic Impulse Response (AIR) in VARAYS in 
order to enable ambisonic acoustic simulation. VARAYS is still at 
very early stages of development, it needs proper support for mate¬ 
rial based diffraction and diffusion. Figure 4(b) shows the general 
workflow, where AIR is applied to a mono sound source and is spa- 
tialized using the usual SATIE pipeline. There is still some work left 
to do in order to fully integrate vaRays into SATIE pipeline (both 
IR and AIR). One of the areas to explore is in the interpolation of 
IR instances in order to compensate for real-time changes in the lis¬ 
tener and the sound source location. This process can be mixed with 
types of rendering which provides sufficient creative liberty to the 
user. There is also some work left to provide IR and AIR to SATIE 
as files I/O are not the most optimal. We will be looking into sending 
OSC blobs. Another path would be sharing buffers between SATIE 
and vaRays using out shared memory library SHMDATA 5 . Another 
desired functionality is rendering VBAP spatialization into b-format 
signals. 
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